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Physical tethering and volume exclusion determine 
higher-order genome organization in budding yeast 
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In this paper we show that tethering of heterochromatic regions to nuclear landmarks and random encounters of 
chromosomes in the confined nuclear volume are sufficient to explain the higher-order organization of the budding yeast 
genome. We have quantitatively characterized the contact patterns and nuclear territories that emerge when chromo- 
somes are allowed to behave as constrained but otherwise randomly configured flexible polymer chains in the nucleus. 
Remarkably, this constrained random encounter model explains in a statistical manner the experimental hallmarks of the 
S. cerevisiae genome organization, including [1) the folding patterns of individual chromosomes; (2) the highly enriched 
interactions between specific chromatin regions and chromosomes; [3] the emergence, shape, and position of gene ter- 
ritories; [4) the mean distances between pairs of telomeres; and (5) even the co-location of functionally related gene loci, 
including early replication start sites and tRNA genes. Therefore, most aspects of the yeast genome organization can be 
explained without calling on biochemically mediated chromatin interactions. Such interactions may modulate the pre- 
existing propensity for co-localization but seem not to be the cause for the observed higher-order organization. The fact 
that geometrical constraints alone yield a highly organized genome structure, on which different functional elements are 
specifically distributed, has strong implications for the folding principles of the genome and the evolution of its function. 

[Supplemental material is available for this article.] 



The structural organization of the genome in its nuclear environ- 
ment is a key factor in the correct execution of nuclear functions 
(Misteli 2007; Takizawa et al. 2008; Taddei et al. 2010). For in- 
stance, in budding yeast, heterochromatic regions such as telo- 
meres and silent mating-type loci are silenced by anchoring them 
to the nuclear envelope (NE), presumably through heterochro- 
matin protein factors (Gotta et al. 1996; Hediger et al. 2002; Taddei 
et al. 2004, 2009; Mekhail and Moazed 2010; Horigome et al. 
2011). For some other genes, the location at the NE has also been 
proposed to play a major role in their transcriptional repression 
(Csink and Henikoff 1996; Dernburg et al. 1996; Maillet et al. 1996; 
Brown et al. 1997; Cockell and Gasser 1999; Towbin et al. 2009). 
However, other genes relocate to the NE upon transcriptional ac- 
tivation (Casolari et al. 2004; Cabal et al. 2006), presumably in- 
stigated by forming interactions with nuclear pore complexes, fa- 
cilitating mRNA export to maximize cellular transcription levels. 

The spatial clustering of functionally related loci is also a key 
characteristic of genome organization. In budding yeast, all het- 
erochromatic centromeres are located in a distinct region of the 
nucleus. This occurs because throughout interphase they remain 
attached through microtubules to the spindle pole body (SPB) 
(O'Toole et al. 1999; Jin et al. 2000). On the other hand, ribosomal 
DNA (rDNA) repeats appear to be clustered at the NE, opposite to 
the SPB in the nucleus (Yang et al. 1989; Dvorkin et al. 1991; 
Bystricky et al. 2005). There they form the core of a distinct sub- 
nuclear compartment named the nucleolus, which is the site of 
RNA pol-I-mediated rDNA transcription and ribosome biogenesis 
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(Yang et al. 1989; Bystricky et al. 2005; Berger et al. 2008; Mekhail 
et al. 2008; Mekhail and Moazed 2010; Taddei et al. 2010). 

There is also growing evidence for a territorial organization of 
the chromosomes in yeast (Bystricky et al. 2004, 2005; Schober 
et al. 2008). Large-scale fluorescence imaging experiments on 
budding yeast have revealed that several individual gene loci are 
strongly confined into distinct "gene territories" (Berger et al. 2008; 
Therizols et al. 2010). Also, several genome-wide conformation 
capture experiments have revealed highly structured chromatin 
contact patterns: Some chromosome pairs were found to interact 
rarely, while others interact more often than expected (Rodley et al. 
2009; Duan et al. 2010). The contact patterns of chromosomes 3 
and 6 in budding yeast agree with a Rabl-like configuration: Both 
chromosomes appear to be folded backward from their centromeres, 
so that their telomeres are juxtaposed Qin et al. 2000; Dekker et al. 
2002; Bystricky et al. 2005; Schober et al. 2008). Such a configura- 
tion and the resulting territorial chromosome organizations have 
been previously observed in live fluorescence imaging experiments 
(Bystricky et al. 2004, 2005; Schober et al. 2008; Taddei et al. 2010). 

At the same time, there is ample evidence that the structure of 
the genome is highly dynamic (Marshall et al. 1997; Heun et al. 
2001). Fluorescence imaging shows considerable cell-to-cell varia- 
tions of gene and chromosome locations (Ferguson and Ward 1992; 
Csink and Henikoff 1998; Heun et al. 2001; Berger et al. 2008). Also 
chromosome contacts are observed over a wide range of frequen- 
cies, indicating that not all contacts can be present simultaneously 
(Dekker et al. 2002; Lieberman-Aiden et al. 2009; Duan et al. 2010; 
Kalhor et al. 2012; Misteli 2012). 

Some intrachromosomal contact probabilities are consistent 
with a diffusion-driven contact formation (Cook and Marenduzzo 
2009; de Nooijer et al. 2009; Mateos-Langerak et al. 2009; Bohn 
and Heermann 2010; Dorier and Stasiak 2010). However, purely ran- 
dom chain behavior as studied in isolated model chromosomes 
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cannot explain many of the specific patterns observed in experi- 
ments (Rosa et al. 2010). 

To fairly assess the principles of chromosome folding and the 
possible role of molecular interactions in establishing nuclear order, 
we must first examine the genome structure that arises when 
chromosomes are tethered but otherwise randomly configured in 
the confinement of the nuclear environment. Previous work 
points toward an important role of nuclear constraints and relative 
chromosome arm lengths in genome organization (Berger et al. 
2008; Taddei et al. 2010; Zimmer and Fabre 2011) as shown for the 
dynamic relationship of subtelomeric regions (Therizols et al. 
2010). However, it remains to be seen if entirely random configu- 
rations of tethered chromosomes are sufficient to reproduce in 
a statistical manner all the available quantitative data about the 
yeast genome organization and gene loci interactions, including the 
highly structured contact frequency maps from genome-wide con- 
formation capture experiments (Dekker et al. 2002; Duan et al. 
2010), the distribution of gene territories from fluorescence imaging 
(Berger et al. 2008; Therizols et al. 2010; Zimmer and Fabre 2011), 
and the clustering of replication start sites as well as tRNA genes. 

Our findings demonstrate that purely random configurations 
of tethered chromosomes do indeed reproduce in a statistical man- 
ner a wide range of data related to genome structure: genome-wide 
chromatin interaction frequencies; the emergence, shape, and lo- 
cation of specific gene territories; the relative distances between 
telomeres; and even the spatial clustering of functionally related 
chromosome regions such as early replication start sites and tRNA 
gene loci. Specific molecular interactions between chromatin re- 
gions, although possible, are not required to explain the available 
experimental data on the higher-order genome organization. More- 
over, the large structural variability among individual cell's genome 
configurations indicates that no single average genome structure can 
adequately reflect the wide range of structural features relevant to 
a population of cells. 

Results 

Population modeling for determining the three-dimensional 
organization of the genome 

To address the challenge of representing highly variable genome 
structures, we construct a large population of three-dimensional 
(3D) genome structures, which represent a spectrum of all possible 
chromosome configurations, and interpret the result in terms of 



probabilities of a sample drawn from a population of heterogeneous 
structures (Methods). 

All chromosomes are modeled as random configurations that 
are subject to the following constraints: (1) All chromosomes are 
confined in the nucleus; (2) all the centromeres are attached to the 
SPB through microtubules; (3) all the telomeres are located near 
the nuclear periphery; and (4) the nucleolus is inaccessible to 
chromosomes, except for those regions containing rDNA repeats 
(Methods) (Table 1; Fig. 1). 

To generate a population of genome structures, we defined an 
optimization problem (Methods). In order to sample a representa- 
tive set of all possible structures, we created a sample of 200,000 in- 
dependently optimized genome structures, hereafter referred to as 
the structure population. We also generated a control population 
with an identical setup but without imposing any landmark con- 
straints (Methods), referred to as the random control. We also cal- 
culated a structure population for a nucleus containing only a single 
chromosome, constrained in a manner identical to the full simu- 
lation. We refer to this population as the single chromosome 
population. 

Probabilistic analysis of chromosome structural features 

In the following sections, we analyze the spatial properties of the 
structure population in terms of several statistical quantities: (1) 
chromosome territory locations, (2) chromosome and gene loci 
interaction frequencies, (3) locus localization probabilities, (4) telo- 
mere distance distributions, (5) physical proximity of functionally 
associated genomic loci including early and late replication origins, 
and tRNA genes. Each property of the simulated structure pop- 
ulation will be compared with available experimental data. 

Chromosome territories as a result of constrained random encounters 

We first ask to what extent the landmark constraints lead to pre- 
ferred chromosome locations. We calculate the probability that 
each chromosome occupies any given region of the nucleus (i.e., 
the localization probability density [LPD] of a chromosome) (Sup- 
plemental Material). Based on the LPD, it is evident that all the 
chromosomes have preferred regions. Smaller chromosomes (e.g., 
chromosome 1 in Fig. 2) reside preferentially around the central 
axis, near the SPB. Interestingly medium-sized chromosomes are 
more likely to reside away from the central axis (e.g., chromosome 8 
in Fig. 2A,B; Supplemental Fig. 1), while for large chromosomes 
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All restraints are expressed as harmonic functions (i/ 1 ), as well as harmonic upper (u ub ) and lower bounds (u lb ), respectively: u h (r/, |m) = J/c(|r, - p\ - d) 2 , 
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point, d = reference distance, k = harmonic constant, and N = the total number of beads in a model. We also define several subsets of beads that share 
certain properties. More specifically, a is the set of beads assigned to the last bead of every chain, /3 is the set of beads assigned to centromeric regions, y is 
the set of beads assigned to telomeric regions, and 8 is the set of all beads flanking rDNA repeat regions. 
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Figure 1 . Population-based analysis of the 5. cerevisiae genome organization. To analyze structural features of the genome, we defined an optimization 
problem with three main components. (Top panels) A structural representation of chromosomes as flexible chromatin fibers (center), a structural rep- 
resentation of the nuclear architecture (left), and the scoring function quantifying the genome structure's accordance with nuclear landmark constraints 
(right). (Middle panels) An optimization and sampling method, which minimizes the scoring function to generate a population of genome structures that 
entirely satisfies all landmark constraints. (Bottom panels) The statistical analysis and comparison of structural features from the population of 3D genome 
structures with all the experimental data. 



(e.g., chromosome 4, whose size is 1.5 Mb), the LPD is highest in the 
central region of the nucleus again along the central axis. 

We then ask what factors are responsible for the chromo- 
somes' preferred locations. For each chromosome, we calculate a new 
structure population for a nucleus containing only a single chro- 
mosome but otherwise constrained in a manner identical to the 
full simulation (i.e., the single chromosome population) (Fig. 2C). 
Comparing the two structure populations reveals great differences 
for each chromosome location (Fig. 2D). For example, in the full 
simulation, large chromosomes reside substantially farther from 
the SPB region toward the nucleolus than would be expected based 
on chromosome tethering alone. The differences are caused by a vol- 
ume exclusion effect: Because of tethering, the chromosomes must 
compete for the limited space around the SPB. Smaller chromosomes 
are naturally more restricted to regions closer to the SPB, which in turn 
tends to exclude parts of larger chromosomes from these regions. For 
smaller chromosomes, the opposite effect is observed; in the full 
simulation, they exhibit an increased probability density around the 
SPB (Supplemental Fig. 1). Importantly, due to the volume exclusion 
effect, the prefened location of a chromosome is not defined by 
tethering alone but also depends on the total number and lengths of 
all other chromosomes in the nucleus. 

Genome-wide chromosome contact patterns 

Next, we measure how often any two chromosome chains come 
into contact with each other over the entire structure population. 
Interestingly, most chromosomes show distinct preferences for 



interacting with certain others. For instance, chromosome 1 has 
a significantly higher chance of interacting with chromosomes 3 
and 6 than with any other chromosome. Its interactions with the 
large chromosomes 4, 7, and 12 are substantially depleted (Fig. 3A). 
Strikingly, almost identical chromosome interaction preferences 
are observed in an independent genome-wide chromosome con- 
formation capture experiment (Fig. 3A; Supplemental Fig. 2A; Duan 
et al. 2010). Pearson's correlation between the chromosome-pair 
contact frequencies in our structure population and those 
detected in the experiment is 0.94 (P < 10~ 15 ). In the random control, 
the contact frequencies do not display any significant chromosome- 
pair contact preferences (Pearson's correlation between experimen- 
tal data and the random control is -0.57) (Supplemental Fig. 2B). 

Next, we compare contact frequencies for all possible pairings 
of the 32 chromosome arms (Fig. 3B,C). It is evident that some 
pairs of chromosome arms have a greater propensity to interact 
than others. In particular, chromosome arms with <500 kb (chro- 
mosomes 1, 3, 5, 6, 8, and 9) are more likely to interact with each 
other than longer arms. For instance, the short arm of chromo- 
some 1R is almost eight times more likely to interact with the short 
arm of chromosome 3L than with the long arm of 4R. Also these 
observations are in almost complete agreement with the confor- 
mation capture experiments (Pearson's correlation coefficient of 
0.93, P < 10" 15 ) (Fig. 3C,D; Duan et al. 2010). 

Finally, when chromatin contacts are analyzed at a resolution 
of 32 kb, the contact frequency heat map of the structure pop- 
ulation shows highly organized cross-shaped patterns (Fig. 3E). 
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Figure 2. Chromosome locations. (A) A sample of 40 chromosome 
configurations randomly selected from the structure population for the 
small chromosome 1 (left panel, blue chains), the large chromosome 4 
(middle panel, green chains), and the medium-sized chromosome 8 (right 
panel, gray chains). The chain thickness is reduced to enhance visibility. 
The chromosomes are depicted in the nucleus with the SPB in pink, the 
nucleolus in dark blue, and the NE in light blue. (B, top) Chromosome 
localization probability densities (LPDs) of chromosomes 1 (left), 4 (mid- 
dle), and 8 (right panel) plotted along the two principal axes p and z 
(Supplemental Material). (Lower left) Reference frame for projecting the 
positions of chromosome points onto the two principal axes, namely, the 
projection along the central axis z (connecting SPB, nuclear center, and 
nucleolus), and the radial distance p-axis indicating the absolute distance 
of a point from the central axis. (C) The LPD of chromosome 4 resulting from 
the "single chromosome population." The chromosome is subject to all 
landmark constraints, but structures are generated without the presence of 
other chromosomes in the nucleus. The density distribution is significantly 
different from the situation when all chromosomes are present (see B). (D) 
Excluded volume effect. The difference map between the LPDs of chro- 
mosome 4 from the structure population when all chromosomes are present 
(B, middle) and the single chromosome population as defined in C. 



Also these patterns are in excellent agreement with those observed 
in the conformation capture experiment (Fig. 3F; Duan et al. 2010). 
The two contact frequency maps are again strongly correlated, with 
an average row-based Pearson's coefficient of 0.94 (all P-values 
<10~ 6 ; Supplemental Material). In contrast, the contact frequency 
map generated from the random control lacks the cross-shaped 
patterns (Supplemental Fig. 2C). We now analyze the intra- and 
interchromosomal locus-pair interaction patterns in more detail. 

Intrachromosomal locus-locus Interactions 

Intrachromosomal contact patterns in the structure population 
and experiments can be divided into three regions (Fig. 4A). Contact 
frequencies are enriched between regions in the same chromosome 
arm, as expected for a constrained random polymer chain (blocks c 
in Fig. 4A). In general, contact frequencies between regions within 
the same chromosome arm increase with decreasing sequence sepa- 
ration, which is shown by the strong diagonal in the contact fre- 
quency maps (Fig. 4A). However, regions located close to the cen- 
tromere behave very differently Contacts between subcentromeric 
regions on opposite sides of the centromere are clearly enriched in 
frequency, even with increasing chain distance, as can be seen along 



the line perpendicular to the main diagonal of the contact frequency 
map (block b in Fig. 4A). Moreover, contact frequencies between 
subcentromeric regions and regions from the bulk of both chro- 
mosome arms are very low (blocks a in Fig. 4A). 

Similar contact patterns have been reported in 3C confor- 
mation capture experiments and have been explained by a partic- 
ular Rabl-like style of chromosome folding (Dekker et al. 2002). 
The hypothesis is that regions on opposite sides of the centromere 
are folded toward each other, possibly indicating the existence of 
a biochemical attraction between loci (Fig. 4B). However, the strong 
agreement between the experimental contact frequency maps and 
our structure population demonstrates that such contact patterns 
are not necessarily caused by specific biochemically mediated in- 
teractions between subcentromeric regions. An equally possible 
explanation is that they represent purely random encounters of 
constrained chromosome chains. 

It remains to be determined which factors are most responsible 
for the folding. In the "single chromosome population," the cross- 
shaped intrachromosomal contact pattern is lost; the contact fre- 
quency map is similar to the random control (Fig. 4A, bottom 
panels). Therefore the particular folding pattern illustrated in Figure 
4B is caused by a volume exclusion effect as a result of the presence 
of all 16 chromosomes. The competition among all centromere- 
tethered strands for the limited space around the SPB naturally leads 
to the style of folding described by experiments, and this folding is 
the proximate cause of the enriched contact frequencies between 
centromeric regions and the observed shielding of these regions 
from chromosome arm interactions. 

Interchromosomal locus-locus contacts 

The interchromosomal contact frequencies in the structure pop- 
ulation are correlated with those observed in experiments, with 
an average Pearson's correlation of 0.54, which is highly signifi- 
cant (P < 10" 15 ) (Fig. 3E,F; Supplemental Fig. 2D). In contrast, the 
Pearson's correlation between the random control and experiments 
is close to nil, and the distinctive contact patterns in the experi- 
mental data are completely absent in the random control (Supple- 
mental Fig. 2C). 

To examine the effect of limited sampling on the accuracy of 
chromosomal contact patterns, we compared our initial contact 
frequency map to maps generated from randomly sampling differ- 
ent proportions of these contacts (Supplementary Material; Sup- 
plemental Fig. 3D). In contrast, to intrachromosomal contacts, the 
correlations between interchromosomal contact patterns are greatly 
affected by limited sampling. At a sampling rate of 0.1%, we find 
that the Pearson's correlation between the two interchromosomal 
contact maps (even when assuming an ideal model) cannot exceed 
0.5. Similar correlation values are observed in the Hi-C experiment 
when two interchromosomal contact maps are compared that are 
generated by using two different restriction enzymes (Yaffe and 
Tanay 2011). In our analysis, the observed correlation value of 0.54 
corresponds to a sampling rate of —0.2%, which is also the order of 
magnitude that is expected for the experiment (Duan et al. 2010). 
Thus, the observed correlation coefficient of 0.54 represents a re- 
markably good agreement between the interchromosomal contact 
patterns, given that the experimental and computational samplings 
are finite and cannot be exhaustive. 

Gene localizations 

We now focus on the nuclear locations of individual gene loci. The 
locations of eight genes have been determined by large-scale fluo- 
rescence imaging experiments (Berger et al. 2008). These locations 
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allowing for a direct comparison with 
fluorescence experiments (Berger et al. 
2008) (Fig. 5A). The density distribution 
functions agree well with experiments, in 
that each locus occupies a well-defined ter- 
ritory. The volumes and shapes of these 
territories strongly resemble those observed 
in experiments (Berger et al. 2008). For 
instance, genes GAL2, HMOl, and 
SNR17A are located near the nucleolus in 
the structure population, as seen in the 
experiment. Interestingly, the structure 
population places SNR17B (no experi- 
ments available) and SNR17A in similar 
positions near the nucleolus, despite the 
fact that these genes are located on differ- 
ent chromosomes. Both of these genes 
are involved in ribosome biogenesis and 
code the snoRNA U3. Also in agreement 
with experiments, the distribution pat- 
terns of the functionally related genes 
RPS5 and RPS20 are quite different. For 
instance, RPS5 positions are significantly 
more diffuse. 

In order to compare quantitatively 
the relative positions of these eight genes, 
we measure their median distance along 
the central axis in the 2D density maps 
obtained from experimental data and in 
the structure population. These positions 
are in excellent agreement (Pearson's cor- 
relation is 0.95, P < 10" 3 ) (Fig. 5B). 



Figure 3. 



Chromosome and gene loci interactions. (A) Histogram of the normalized contact frequencies 
of chromosome 1 with other chromosomes in the structure population (black bars), chromatin confor- 
mation capture experiments (gray bars) (Duan et al. 201 0), and the random control population (white bars). 
Contact frequencies of other chromosomes are shown in Supplemental Fig. 2. (B) Comparison of chro- 
mosome-arm-chromosome-arm contact frequencies from the structure population and experiments (Duan 
et al. 201 0). (C,D) Contact frequency heat maps for chromosome arm contacts in the structure population 
(C) and experiments (D) (Duan et al. 201 0). Heat maps of the genome-wide contact frequencies between 
loci at 32-kb resolution determined from the structure population (f ) and in a chromosome conformation 
capture experiment (F) (Duan et al. 201 0). (Color code ranges from white for low to red for high values.) 
Centromere positions are marked by the ticks. The row-based average Pearson's correlation between the 
two heat maps is 0.94 (all P-values <1 0 6 ). The largest differences between both heat maps involve in- 
teractions to the small arm of chromosome 1 2, which is not surprising because it contains all of the rDNA 
genes located in the nucleolus, which are not explicitly treated in our simulation. To further improve these 
interactions it is necessary to include these regions in future simulations (see Supplemental Material). 



are measured with respect to the two principal axes of the nucleus 
(Methods; Fig. 5A). We determined the two-dimensional (2D) density 
distributions of the same gene loci in our structure population, 



Pairwise telomere distances 

It is well known that telomeres are not 
positioned randomly on the nuclear pe- 
riphery (Gotta et al. 1996; Bystricky et al. 
2005; Berger et al. 2008; Therizols et al. 
2010). Fluorescence imaging has revealed 
that the distance between any two sub- 
telomeres increases gradually with 
the arm lengths of their chromosomes 
(Therizols et al. 2010). For a given sub- 
telomere, this relationship is linear. In the 
structure population, we observe a very 
similar behavior. More specifically, after 
applying a change point analysis (Zeileis 
et al. 2003), we find that the distance 
between subtelomere pairs as a function 
of arm length is divided into two linear 
regimes (Fig. 6). For chromosome arms 
with lengths up to —360 kb, the distances 
observed in our structure population in- 
crease with a relatively steep slope. Above 
360 kb, the slope decreases significantly. 
This behavior is entirely consistent with 
experiments, and the change in slope has 
been explained as follows (Therizols 
et al. 2010). For small arms, the accessible 
position of a subtelomere at the NE is entirely restricted by the arm 
length. Hence, the median distance between two subtelomeres in- 
creases rapidly with their accessible areas. However, at a certain arm 
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Figure 4. Chromosome folding. (A) Heat maps showing intrachromosomal contact frequencies for 
chromosome 4 obtained from conformation capture experiments (top left) (Duan et al. 201 0), structure 
population (top right), random control (bottom left), and single chromosome population (bottom right). 
The latter is derived from a structure population for a nucleus containing only chromosome 4, con- 
strained in a manner identical to the full simulation. Heat maps of the experiment and the structure 
population show characteristic folding patterns reminiscent of the back-folding of subcentromeric re- 
gions onto themselves. The heat maps of the random control and the single chromosome population 
lack the characteristic pattern. (B) Scheme showing the particular back-folding of the regions adjacent to 
both sides of the centromere for several chromosome configurations. 



length, the subtelomere is able to reach all points on the NE. Further 
increases in arm length do not dramatically increase the median 
subtelomere distance. 

Interestingly, the change in slope occurs at a slightly differ- 
ent arm length in experiments (—310 kb) (Therizols et al. 2010) 
compared to our structure population (356-440 kb) (Supplemental 
Fig. 5a). However, the incompleteness of the experimental data can 
explain some of this difference. If we only include those chro- 
mosomes that are also analyzed in the experiment (Supplemental 
Fig. 5b), the change in slope in our simulation shifts to 309-327 kb 
in remarkably good agreement with the experiment. 

Telomere clusters 

To identify subtelomere clusters in the structure population, for 
each subtelomere we calculated the fraction of structures with at 
least one other subtelomere within 250 nm. In agreement with 
another experiment (Therizols et al. 2010), such small subtelomere 
distances are infrequent (l%-3%) for most chromosomes (Sup- 
plemental Fig. 6A). However, a few co-locations are observed more 
frequently between relatively short chromosome arms, namely, 
1R:1L, 6R:6L, 1R:9R, and 3L:3R. These pairings occur in 12%-20% 
of the population, also in agreement with experiments (Bystricky 
et al. 2005; Schober et al. 2008; Therizols et al. 2010). For example, 
the pairs 3R:3L and 6R:6L were recently reported to form signifi- 
cant but transient interactions, leading to the formation of chro- 
mosome loops (Bystricky et al. 2005; Schober et al. 2008; Therizols 
et al. 2010). In general, we find that as the length difference be- 
tween chromosome arms grows more pronounced, the probability 
of their telomeres being co-localized decreases. Interestingly, the 
volume exclusion effect has a pronounced effect on the co-location 
frequency of telomeres. For small and also large chromosomes, the 
volume exclusion effect increases significantly the co-location 



frequency, while for medium-sized chro- 
mosomes the opposite is observed by 
decreasing the co-location frequency 
(Supplemental Fig. 6B). For instance, the 
fraction of co-located telomeres increases 
by almost 20% for the small chromosome 6 
upon the presence of all other chromo- 
somes in the nucleus, while it decreases by 
60% for the medium-sized chromosome 8. 

Co-localization of functionally related loci 

Next, we investigate whether function- 
ally related gene loci are co-localized in 
the structure population. First, we com- 
pare the 3D spatial distributions of early 
and late replication start sites in the 
structure population. These sites are dis- 
tributed across all chromosomes (Fig. 7, 
right panels). Experimental evidence ex- 
ists that early replication sites are spatially 
clustered during interphase (Di Rienzi 
et al. 2009; Duan et al. 2010). 

In each structure of the population, 
we calculate the mean pairwise distance 
between all early replication sites. The 
frequency distribution derived from these 
mean pairwise distances is compared to 
a distribution chosen from randomly se- 
lected sites in the genome. We observe 
significant spatial clustering of the early replication sites (Fig. 7A), 
in the sense that their mean pairwise distances are significantly 
less than would be expected from randomly selected sites (Stouff- 
er's Z- transform [Stouffer et al. 1949] tests z-scores < -160; Sup- 
plemental Material). This observation holds for all three sets of 
early replication origins identified in the literature (Feng et al. 
2006; McCune et al. 2008; Sekedat et al. 2010). Remarkably, for late 
replication sites we see the opposite effect: a statistically signif- 
icant increase in the mean pairwise distances between late repli- 
cation sites compared with the background. It appears that, on 
average, early replication start sites are closer to the centromere on 
the chromosome sequence compared with the late start sites (all 
P- values <10~ 5 for the three data sets) (Supplemental Material). 

We also analyzed the spatial positions of all tRNA gene loci in 
the genome, which have been observed to cluster in experiments 
(Thompson et al. 2003; Duan et al. 2010). Again, we observe a sta- 
tistically significant decrease in the pairwise distances between tRNA 
loci (Fig. 7B) compared with randomly picked loci. 

Our observations clearly indicate that the chromosomal lo- 
cations of these specific sites are not randomly distributed over the 
genome; they are positioned in such a way that early replication 
sites have a higher probability of being co-localized when the chro- 
mosome chains behave as random polymer chains that are subject to 
nuclear landmark constraints. 



Discussion 

In this paper, we demonstrate that purely random configurations 
of tethered chromosomes reproduce in a statistical manner all the 
experimental hallmarks of genome organization in Saccharomyces 
cerevisiae. Specifically, random configurations generate structural 
features that agree remarkably well with (1) the highly specific 



1 300 Genome Research 



www.genome.org 



Principles of 3D genome organization in yeast 




Figure 5. Gene territories. (A) Projected localization probability densi- 
ties for the positions of eight gene loci in the structure population. The 
probability densities are determined with respect to two principal axes of 
the nuclear architecture (top right panel). The z-axis connects the SPB with 
the origin at the nuclear center and the nucleolus. The radial axis p defines 
the distance of a point from the central axis (top right panel). The lower 
half of the projected localization density plot is mirrored from the top half 
for visual convenience. (B) Median gene loci position along the z-axis 
calculated from the projected probability localization densities in A and 
from the experiment (Berger et al. 2008). The two are highly correlated 
with R 2 = 0.9. To allow for comparison with the experiment, the z-axis 
distance of a gene locus is normalized relative to the SPB-gene distance. 

interaction patterns between individual chromosomes, chromo- 
some regions, and chromosome folding patterns obtained by 
genome-wide conformation capture experiments (Duan et al. 
2010); (2) the emergence, shape, and position of individual gene 
territories revealed by probability maps from fluorescence experi- 
ments (Berger et al. 2008); (3) the distribution of median distances 
between telomeres; (4) the relative frequencies of telomere co- 
locations observed in imaging experiments (Bystricky et al. 2005; 
Schober et al. 2008; Therizols et al. 2010); and even (5) the physical 
proximity of functionally related gene loci, including early repli- 
cation sites and tRNA gene loci. 

In addition to chromosome tethering, the main organizing 
factor is a volume exclusion effect, as a result of the competition of 
all the chromosomes for the limited nuclear space. The fact that 
the chromosome arms have different lengths gives rise to impor- 
tant nuances of organization and implies that the locations of a gene 
or chromosome territory depends on all the other chromosomes. 
Therefore, the gene territory position and specific interaction pat- 
terns of a given gene locus is determined not only by its chromosome 
sequence position and the arm lengths of its own chromosome but 
also by the total number and the relative arm lengths of all other 
chromosomes. The volume exclusion effect can even create coun- 
terintuitive effects. For instance, for small and large chromosomes 
the volume exclusion effect leads to an increase in the frequency 
with which subtelomeres on the same chromosome are in proximity 
to each other, while for medium-sized chromosomes a decrease is 
observed. 

Our findings have several important consequences. First, we 
show that a small number of purely geometrical constraints on 
otherwise randomly configured chromosomes can lead to a highly 
structured 3D genome organization. Second, the hallmarks of 



genome organization can be explained without calling on specific 
molecular interactions between chromatin regions or chromatin- 
bound proteins. For instance, random chromosome encounters 
can also statistically explain the spatial features often attributed 
to an apparent Rabl-like chromosome folding, which refers to the 
back-folding of subcentromeric chromosome regions so that chro- 
mosome arms appear juxtaposed. This pattern is mainly caused by 
the volume exclusion effect (Fig. 4). In response to the competition 
for the limited space around the SPB, chromosome regions on both 
sides of the centromere show a statistical preference for bending 
toward each other. When averaged over the entire cell population, 
this tendency gives rise to the distinctive cross-shaped intra- 
chromosomal contact patterns observed in experiments and in our 
structure population. However, most individual structures will not 
exhibit simultaneously all the features of such an apparent Rabl-like 
fold. We therefore emphasize that the data should be explained as 
a statistical preference for chromosome contacts but not necessarily 
be interpreted as a stable chromosome fold. An interesting predi- 
cation of this model is that the Rabl-like subcentromeric contact 
pattern should not be expected in yeast species if the number of 
chromosomes was considerably smaller even if the chromosomes 
were all tethered to nuclear landmarks. Although S. pombe and 
5. cerevisiae have similar genome sizes, the former has only three 
chromosomes. The prediction is sustained: In genome-wide 
conformation capture experiments, Schizosaccharomyces pombe 
yeast does not show the cross-shaped intrachromosomal con- 
tact patterns characteristic of this type of folding (Tanizawa 
et al. 2010). 

Another remarkable result is that the early replication sites in 
our structure population have a high probability of being in close 
proximity compared with the background distribution of pairwise 
separations. In contrast, late replication sites have a lower proba- 
bility of being colocated compared with randomly selected sites. 
This difference may help regulate a naturally occurring order on 
replication timing. The existence of these and other co-location 
patterns may indicate that the relative positions of affected loci in 
the chromosome were selected by evolution. Due to excluded 
volume effects, the spatial position of a gene in the nucleus is not 
only modulated by its relative sequence position in its own chro- 
mosome, but also by the relative arm lengths and the total number 
of all other chromosomes in the nucleus. 

We also note that our study provides additional evidence for 
the existence of a chromatin fiber in the yeast interphase nucleus 
with length and density properties similar to the 30-nm fiber. We 
created an alternative structure population consistent with a 10-nm 
chromatin fiber, and the statistical results do not agree with the 
described experimental evidence. 

Finally, we believe that our results point toward a considerable 
structural variability of genome structures among individual cells. 
Each structure in our population not only differs considerably 
from the "average conformation " but also from all the other struc- 
tures in the population (<0.3% of loci contacts are shared between 
any two structures; Supplemental Material). No single-genome 
structure or population-averaged structure is representative of the 
population. Although the true structural variability is unknown, our 
results indicate that a single structural model cannot adequately 
reflect all the spatial features of the genome. It is crucial to analyze 
genome structures from a statistical rather than an individual 
standpoint. Structural analysis should be performed by generating 
a population of 3D genomes, which represent the spectrum of all 
possible chromosome configurations consistent with the data. The 
structural organization of the genome can then be interpreted sta- 
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Figure 6. Median telomere-telomere distances in the structure population. The median distances 
between a telomere of a reference chromosome arm and all other telomeres are plotted for reference 
chromosome arms (A) 6R, (B) 10R, (C) 7R, and (D) 4R. (Vertical dashed line) Change point with 95% 
confidence interval shown by the shaded area. 



Nuclear architecture 

The nuclear radius is set to 1 micron, as 
suggested by experiments (Gasser 2002; 
Chubb and Bickmore 2003; Berger et al. 
2008; Meister et al. 2010). The relative 
position and size of the SPB and nucleolus 
are taken from imaging experiments 
(Berger et al. 2008). The SPB and nucleolus 
are located at opposite ends of the nucleus, 
while a central axis connects the centers of 
the SPB, nucleus, and nucleolus (Fig. 1). 

Scoring function 

The scoring function is defined as a sum 
of spatial restraints and quantifies the 
degree of consistency between the struc- 
ture and the imposed landmark constraints 
derived from experimental information. To 
optimize the structure, the scoring func- 
tion is minimized to a score of zero. The 
scoring function is written as 

N-l 

S(r lV .,r N ) = X U ch (r h r i+1 ) 

N-l N 

+ E Ztfexc(r z -,r ; -) 

i = 1 / > i 
N 

+ E £W(r z -)+ £ U cen {ri) 

i = l z'e/3 
+ X ^telO/)+ £ l/inuO;) 

+ £ tfonu(r f )=0, 



tistically as a sample drawn from a population of heterogeneous 
structures. Such an analysis will allow a more accurate statistical 
description of genome structures and pave the way for structure- 
function analysis in the future. 

Methods 

General description 

The nuclear architecture is described by the NE, the SPB, the nu- 
cleolus, and the 16 individual chromosomes in the haploid yeast 
genome (Fig. 1). The positions of the NE, SPB, and nucleolus 
remain fixed, while the configurations of the chromosomes are 
optimized. 

Chromosome representation 

The 16 chromosomes are described as flexible chromatin fibers, 
which in turn are represented as chains of connecting spheres with 
a radius of 15 nm. The compaction ratio is set to six nucleosomes 
per 1 1 nm of length. This figure agrees with other experiments, 
which have measured the compaction ratio to be between 1.2 and 
1 1 nucleosomes per 1 1 nm of length (Thoma et al. 1979; Gerchman 
and Ramakrishnan 1987; Bystricky et al. 2004; Dekker 2008). Thus, 
each bead accommodates —3.2 kb of genome sequence. The —12 
Mb yeast genome is represented by a total of 3779 beads. Changing 
the compaction ratio will slightly change the total number of beads 
but will not affect the outcome of the calculations at the resolution 
of our analysis. 



where r z - e 3£ 3 is the coordinate vector of bead i, and N is the total 
number of beads in a model. The restraints are expressed as pseudo 
potential energy terms u described in Table l;a,(3,y, and 8 are subsets 
of specific beads in the chromosome chains that share certain prop- 
erties. More specifically, a is the set of beads assigned to the last bead 
of every chain, (3 is the set of beads assigned to centromeres, y is the 
set of beads assigned to telomeres, and 8 is the set of all beads flanking 
rDNA repeat regions. 

Chromatin chain restraint U ch 

Two consecutive beads in a chromosome chain are restrained to be 
at a distance of 30 nm (Table 1). 

Chromatin chain excluded volume restraint U exc 

Overlap between beads is prevented by imposing excluded volume 
restraints for all bead pairs (Table 1). 

NE restraint U nuc 

All chromatin beads must remain within the nucleus, defined as 
a sphere with radius R nuc = 1 micron (Table 1). 

Centromere localization restraint U cen 

All the centromeres cluster near the SPB, which is the microtubule 
organization center in the yeast nucleus (Jin et al. 2000). The cen- 
tromeric regions are attached to the SPB through microtubules up to 
300 nm in length (OToole et al. 1999). Accordingly, all beads rep- 
resenting centromeric regions are restricted to a spherical volume 
with a radius 300 nm, centered on the SPB (Fig. 1). We follow ex- 
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Figure 7. Spatial clustering of replication origins and tRNA gene loci. (Left) The histograms show the 
distribution of the mean pair distance ratio between a set of specific sites (e.g., early replication sites) and 
all sites in the structures of the population. The histograms are generated as follows: For a given structure 
in the population, the mean pair distance between a set of specific loci (e.g., all early replication origins) is 
calculated. This distance is divided by the mean pair distance of all sites in the same structure. The 
distribution of the distance ratio is then obtained from all structures in the population. If the distribution is 
centered on 1 (vertical dashed line), the selected sites behave similarly to a random sample of all sites. If 
the distribution is shifted toward smaller values, their pair distances are smaller than would be expected 
from the background control. If the distribution is shifted to larger values, the selected sites are more 
distant from each other than would be expected from the random control background. (A) Distribution 
of distance ratios of early (red) and late (green) replication start sites as determined by three independent 
experiments using the CDR (Clb5 Dependent Region), Rad53 checkpoint-mediated regulation, and 
GINS complex as identifiers. For the latter case, category 1 sites start replication earlier than category 2 
sites. (Right panel) The positions of each site in the chromosome sequence. The number of early- and 
late-firing sites labeled with CDR, Rad3, and GINS are 77 and 123, 101 and 99, and 169 and 135, 
respectively. (B) Distribution of distance ratios for all 275 tRNAs (loci extracted from SGD, http:// 
www.yeastgenome.org). For all sets of sites in A and B, the shift of the mean pair distances is highly 
statistical significant (for details, see Supplemental Material and main text). 



able for the rDNA genes, we do not ex- 
plicitly resolve the chromatin fiber within 
the nucleolus. Instead, we anchor the two 
beads at the beginning and end of the 
rDNA repeat region (i.e., positions 458 kb 
from the left telomere and 620 kb from 
the right telomere in the sequence of 
chromosome 12) to the surface of the 
nucleolus (Table 1; Fig. 1). 

Nucleolus excluded volume restraint U onu 

All chromosomal regions except those 
containing rDNA repeats are excluded 
from the nucleolus. (Table 1). 

Chromatin persistence length 

During the optimization process, we im- 
posed an angular restraint between each 
set of three consecutive beads to reproduce 
the desired chain stiffness. The constraint 
is expressed as a harmonic potential: 



U, 



1, 



angle : 



K angle 



|r i+ i-r z -| |r i+2 -r i+ i| 



for i, i + and i + 2 on the same chain. 

This restraint is considered only when 
calculating gradient forces during the op- 
timization process. It makes no contribu- 
tion to the total score of a model (below). 
With a force constant of fcangie = 0.2 kcal/ 
mol, we obtain chromatin chains that be- 
have like random polymers with a persis- 
tence length between 47 and 72 nm (the 
average is 61.7 ± 7.7 nm) (Supplemen- 
tal Fig. 7), consistent with experiments. 
Estimated values for the persistence length 
from experiments fall between 30 and 220 
nm (Cui and Bustamante 2000; Bystricky 
et al. 2004; Langowski 2006). 



perimental evidence from fluorescence imaging and place the cen- 
tromere localization volume on the central axis, close to the NE (Fig. 
1, scheme in top right panel; Berger et al. 2008; Therizols et al. 2010). 

Telomere localization restraint U tej 

Telomeres have a high probability to be located near the nuclear 
periphery (Berger et al. 2008; Therizols et al. 2010). Beads repre- 
senting telomeres are positioned in the vicinity of the NE (Table 1; 
Fig. 1, thin gray shell of 50-nm thickness). 

Nucleolus localization restraint U inu 

The rDNA is located on chromosome 12 and consists of 150-200 
tandem repeats of 9.1 kb length each (Kim et al. 2006; Taddei et al. 
2010). All rDNA regions are found in the nucleolus. Because no 
conformation capture data or fluorescence imaging data are avail- 



Optimization 

The optimization is performed using 
a combination of simulated annealing 
molecular dynamics and the conjugate 
gradient methods implemented in the Integrated Modeling 
Platform (IMP; http://www.integrativemodeling.org) (Alber et al. 
2007a,b, 2008; Russel et al. 2012). An individual optimization starts 
with an entirely random bead configuration, followed by an initial 
optimization of the structure. Next, we apply simulated annealing 
protocols to entirely equilibrate the genome configuration. Finally, 
conjugate gradient optimization ensures that all constraints are satis- 
fied, leading to a structure with score zero. Many independent op- 
timizations are carried out to generate a population of 200,000 
genome structures with a total score of zero, hence consistent with 
all input data. A comparison between the frequency maps of two 
independently calculated populations, each with 100,000 struc- 
tures, showed that our genome structure population is highly re- 
producible (Pearson's correlation between the contact frequency 
maps of the two populations is 0.999). 
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Control population 

We also generated a control population of 25,000 structures 
without chromosome tethering constraints and nucleolus ex- 
cluded volume constraints. Otherwise, the chromosomes are 
constrained in a manner identical to the full simulation. 

Analysis 

The analysis of the structure population and all statistical tests are 
described in great detail in the Supplemental Material. 
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